首页> 外文OA文献 >TULIP software and web server : automatic classification of protein sequences based on pairwise comparisons and Z-value statistics
【2h】

TULIP software and web server : automatic classification of protein sequences based on pairwise comparisons and Z-value statistics

机译:TULIp软件和Web服务器:基于成对比较和Z值统计自动分类蛋白质序列

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A configuration space of homologous protein sequences (or CSHP) has been recently constructed based on pairwise comparisons, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) and following evolutionary assumptions. A Z-value cut-off is applied so as proteins are placed in the CSHP only when the similarity of pairs of sequences is significant following the Theorem of the Upper Limit of a score Probability(TULIP theorem). Based on the positions of similar protein sequences in the CSHP, a classification can be deduced, which can be visualized as trees, called TULIP trees. In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees. To date, no tool has been made available to allow the computation of TULIP trees following this model. The availability of methods to cluster proteins based on pairwise comparisons and following evolutionary assumptions should be useful for evaluation and for the future improvements they might inspire. We developed a web server allowing the local or online computation of TULIP trees based on the CSHP probabilities. The input is a set of homologous protein sequences in multi-FASTA format. Pairwise comparisons are conducted using the Smith-Waterman method, with 100-1,000 sequence shuffling to estimate pairwise Z-values. Obtained Z-value matrix is used to infer a treewhich is then written to a file. Output consists therefore of a Z-value matrix, a distance matrix, a TULIP treefile in NEWICK format, and a TULIP tree visualisation. The TULIP server provides an easy-to-use interface to the TULIP software,and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions. TULIP trees are consistent with phylogenies in numerous cases, but they can be inconsistent for multi-domain proteins in which some domains have been conserved in all branches. Thus TULIP trees cannot be considered as conventional phylogenetic trees, following the MIAPA (Minimum Information About a Phylogenetic Analysis) recommendations.A major strength of the TULIP classification is its statistical validity when analysing samples including compositionally unbiased and biased sequences (i.e. with biased amino acid distributions), like sequences from Plasmodium falciparum. The TULIP web server is a service of the Malaria Portal of the University of Pretoria, South Africa, and is available at http://malport.bi.up.ac.za/TULIP/
机译:最近已经基于成对比较构建了同源蛋白质序列(或CSHP)的构型空间,其概率是从Z值统计(应用于成对比较的蒙特卡罗方法)和进化假设推导出来的。应用Z值截止,以便仅在遵循成对概率上限定理(TULIP定理)的序列对的相似性显着时,才将蛋白质放置在CSHP中。基于CSHP中相似蛋白质序列的位置,可以推断出一个分类,可以将其可视化为树,称为TULIP树。在以前的案例研究中,TULIP树显示出与系统发育树一致。迄今为止,还没有工具可用于根据该模型来计算TULIP树。基于成对比较和遵循进化假设对蛋白质进行聚类的方法的可用性对于评估和启发他们未来的改进将是有用的。我们开发了一个Web服务器,可以基于CSHP概率对TULIP树进行本地或在线计算。输入是多FASTA格式的一组同源蛋白质序列。使用Smith-Waterman方法进行成对比较,并进行100-1,000个序列改组以估计成对Z值。使用获得的Z值矩阵来推断树,然后将其写入文件。因此,输出由Z值矩阵,距离矩阵,NEWICK格式的TULIP树文件和TULIP树可视化组成。 TULIP服务器为TULIP软件提供了易于使用的界面,并允许基于成对比对和遵循进化假设对蛋白质序列进行分类。 TULIP树在许多情况下都与系统发育一致,但是对于多域蛋白(其中某些域在所有分支中均已保守)而言,它们可能不一致。因此,按照MIAPA(有关系统发育分析的最低信息)的建议,TULIP树不能被视为常规的系统发育树。TULIP分类的主要优势在于其在分析包括成分无偏和有偏序列(即有偏氨基酸)时的统计有效性。分布),例如恶性疟原虫的序列。 TULIP Web服务器是南非比勒陀利亚大学疟疾门户网站的一项服务,可从http://malport.bi.up.ac.za/TULIP/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号